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AJIFOPHTM CHHTAKCHYHOLO aHalli3y 
Ha 6a31 rpaMaTH4HXx mpaBui 


Vi3noxxeH pa3spaOoTaHHbIli asIropuTM CMHTaKCH4eCKOrO aHasiv3a, BbITIOJHAIOMIMM MOCTpoeHHe DepeBa 3aBHcH- 
MOCTeif JIA MIpOCTOTO paciipocTpaHEHHOPO HeOCIOXKHEHHOTO NpeVIOKeHUA PYCCKOTO A3bIKa. AJIFOPHTM HaxOQUT 
CJIOB Tapbl, MexKY KOTOPbIMM BO3MO7KHa CHHTAKCHYeCCKad CBA3b B COOTBCTCTBHM C rpaMMaTHYeCKHMH TpaBHiaMu. 
JIA BbITeICHHA MpeCMKAaTHBHOTO AQpa MpeANOKeHHA MCMOMb3YIOTCA WAaOOHbI MHHMMAJIbHBIX CTPyKTYPHBIX 
CXeM IipeIoxKeHHH. 

Kos1r04ueBbie CsI0Ba: CHHTakCH4eckHit aHasM3, rpamMMaTHYecKHe NpaBua, 

TIpeWIO%KeHHA PYCCKOLO ABbIKAa, pe THKATHBHOe APO pe IOKeHHA. 


The developed algorithm is called upon to build a dependency tree by the simple expanded Russian sentence. The 
algorithm finds the words pairs between which the syntactic connection is possible. The algorithm finds these pairs 
according to grammatical rules. The minimal structure schemes are used to represent a predicative base of sentence. 
Key words: syntactic analysis; grammatical rules; Russian sentence; predicative base of sentence. 


BukslafeHo po3spoOleHHii aropuTM CHHTaKCHYHOTO aHallisy, Akuii Oyaye Tepeso 3aslexKHOCTel Ja MpocToro 
MOWMpeHoro HeyCKaqHeHOro peyeHHA pocilicbKol MoBH. AJITOpHTM 3HaXOJMTb CIB Tap, MDK AKHMH MOXKIMBHI 
CHHTAaKCHYHH 3B'930K Y BIQNOBIJHOCTI 3 rpaMaTHYHHMM paByiamu. Ja BUAWIeHHA MpeyqHKaTMBHOTO Apa 
TIponmosuuii BAKOPUCTOBYIOTbCA WaOIOHH MIHiMasIbBHUX CTpyKTYPHHX CXeM peyeHb. 

K.sro4uosi c1oBa: CHHTaKCHYHH aHasii3, PpaMaTH4YHi WpaBusia, peueHHA pocilicbKoi MOBH, 
peAWKaTHBHe Apo peyeHHa. 


The difficulty of automatic syntactic analysis of Russian texts is caused by features 
of the Russian language: free word order and the phenomenon of homonymy on 
morphological and higher levels of language. Therefore, this problem is usually solved 
using the statistical methods. Under this approach a large annotated textual corpora is used. 
A creation of such corpora is a time-consuming task. Moreover, the solving of particular 
tasks of text analysis only by statistical methods says few new for the fundamental 
linguistics. Nowadays tools for automatic text analysis that based on linguistic methods are 
not developed enough. This fact points to the importance of the research. 
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The object of research is a simple sentence of Russian language that is expanded but not 
semi-composite. It means that the sentence does not contain following constructions: 
parentheses, absolute participial clause, homogeneous parts of the sentence. Considered in the 
article sentences also should not contain conjunction, connective word, particle, interjection. 

The subject of research is the method of building a dependency tree by the sentence. 

The goal of research is to create the syntactic analysis algorithm based on 
grammatical rules. 

The proposed algorithm of Sentence Processing consists of following stages. 

1. The morphological analysis of wordfroms. 

2. Searching for pairs of potentially connected wordforms in the sentence. 

3. Reduction a quantity of pairs of potentially connected wordforms. 

4. Building of dependency tree. 

The morphological analysis of wordfroms. 

We execute the morphological analysis by using the Module of morphological 
analysis of Russian words RDMA JAI. This module is a dynamic link library for 
Windows. The database of the module contains the paradigms of Russian words. Each 
paradigm is a set of wordforms connected with their lemma (dictionary form). Lemma is 
also considered a wordform. All wordforms are represented by pairs: spelling and 
morphological information (MI). In this paper, the term «morphological information» is 
used to mean a set of values of grammatical categories (e.g. Person: Ist person, 2nd 
person, 3rd person; Number: singular, plural; Case: nominative case, genitive case, etc.) 

The Module RDMA IAI able to solve following problems: normalizing of word- 
forms to the dictionary form (lemma); synthesis of all wordforms (paradigm) of word. 

The output of function that normalizes a wordform is an array of pairs: lemma’s 
spelling, MI of wordform. After the stage of morphological analysis the sentence S that 
consists of N wordforms is represented by the vector: 


S= Bipsac S555 8y) (1) 
Here i means the wordform’s number in the sentence, s; is an array of interpretations 


ry of i-th wordform: 
I I I 
S, = {5} 5.0.55 5.05,, } 


pre n, : 


(2) 


Each interpretation s‘, is represented by pair: lemma’s spelling w’ and 


J 
of wordform: 


pe IES (3) 
Searching for pairs of potentially connected wordforms in the sentence. 
At the second stage we search pairs of potentially connected interpretations of 
wordforms. Let us introduce a relation 7(x,y,f) . It accepts value 1, if the connection of 


morphological information m’ 


type ¢ is possible between the interpretations of wordforms x and y, such xes,, yes,, 


i#j, teT7. Herewith x is the main word of syntactic connection, y is the dependent 
word and 7 is the set of syntactic connection types. 

The set of syntactic connection types 7 is union of two subsets: 7m and Ta: 

T=ImvUTa, TmanTa=©. 

Here Ta is a set of types of relation with the minor sentence parts (categorial agreement, 
government, joining). The 7m is a set of relation’s types between the principal sentence parts. 
This set we build using the minimal structure schemes (MSS) [1, p. 742-727] that declare a 
predicative base of Russian sentences. The following reference designations are used in the 
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table I. The following reference designations are used in the table. The predicative aspect 
of sentence is presented as: finite verb (V,); finite copulative verb (Copy) (e.g. “ObrTb ” — “to 
be”, “ka3aTBca” — “to seem”, “cTaHoBuTbces” — “to become”); infinitive (/nf), transfering 
the specific modal meaning; impersonal forms of copulative verb — singular or plural of 
copula in 3-rd face (Cop,3, Copyx). The nominative aspect of the sentence is presented 
using name forms and adverbs: the noun forms of nominative and instrumental cases (N1/s) 
also the non propositional and propositional forms of any oblique case which are capable 
to be combined with the copula (N2_ pr); the adjective and passive particles forms of 
nominative and instrumental cases, short form and comparative degree of the adjective 
(Adji/s/¢); the adverbs, which are capable to be combined with the copula (Advp,). 


Table 1 — The minimal structure schemes 


No Mantnab sa uciate Examples of sentence 
scheme 

1 | M Ve [pauqu mpusietesmu. 

N, Cop; Adj Houb Tuxas (Tuxa). Houb Oba THxasd (THXOM, THXa). 
LOR DUE Hous Obuia Tue THA. 
3 | NM Copy Nis Ou — cTyqenT. OH Ob cTyeHT. OH OBI CTYJICHTOM. 
4 N C Jjom Opi c sudtom. Uait — c caxapom. Iy1a3a — HaBBIKarTe. 
: opp No... Ina3a Opium HaBbikate. Ilogapox — II 6 
Lik : pox — cprny. [logapox Oni 
‘a CBIHY. 

5 | Inf Vr KyputTb Boctipemasioce. 

6 Jlo3BOHUTECA — 1poOseMa (ObIO poOseMoi;t). 
tp CopeNis JIroOuTb MHBIX — TAKeJIBIM Kpecr. 

7 TIpomosruatb — pa3yMHo. ITpomosartb — caMoe pa3yMHOoe. 
Inf Copy Adj s/f TIpomosruatp OpuIo pasyMHo. ITpomosatb ObuIO CaMBIM 

pa3yMHbIM. 
8 ] Mossatb ObuIO B ero MpaBusax. Momyatb — B ero 
“ Copy No... Mosuats Opmuio HekctatTu. Vatu TpyqHo 
iAdisy saa ) . Maru tpyaqHo. 
Mara3HH WITH (OBLIO) CbIHYy. 

9 | Inf Copf Inf OTKa3aTBca OBLIO OOUTeTH. 

10 | Va Cmepxaetca. EMy He3}0poBuTca. 

11 | Von Ero oOugem. B kilacce 3alllyMesu. 

12 | Coppi N2...pr/AdVpr Ort Hero ObLIH B BOCTOpre. C HUM OBI 3alIpocTo. 

13 | CoprM, byxet ox 1b. bauia 34Ma. Ocens. 

14 | Cops3 Adjysn Baio TeEMHO. 

15 | Copp Adjfpi Pe3yIbTaTOM ObUIM JIOBOJIBHBI. 

16 | Inf Bpitb 10-BallieMy. 

17 | Cops3 No...pr/AdVpr | byet 6e3 ocayKos. bpwi0 1031HO. 


Let’s set out each MSS by some templates (see table 2). The template is a sequence 
of notations of rules which define if the relation te 7m between x and y is possible. 
These rules are described in tables 3, 4. We build templates only for sentences, which 
predicative base consists of two words and more. Therefore, the table 2 does not contain a 
template for the MSS Ne16. 

The table 3 contains simple rules that define relations 7(x,y,t) between the principal 
sentence parts (t¢ 7m). There is defined if syntactic connection is potentially possible 
using only following information: 

— part of speech of principal word x and part of speech of dependent word y ; 
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— sequence order of principal word x and dependent word y in the sentence 
(‘direct’ — x stands before y; ‘indirect’ — y stands before x; ‘any’ — sequence order of 


principal word x and dependent word y in the sentence is unimportant); 
— is adash must be placed between the of principal word and dependent one. 


Table 2 — Templates of the minimal structure schemes 


MSS MSS template Example 
1 _Kl I paun mpusietesn. 
_K2 Houp Tuxas (THxa). 
2 _KNC_L+_KCAdj Houp Oba Tuxas (THXOH, THXxa). 
Houp Tule JHA. 
_K3 Ou — cTyJeHT. 
3 _KNC_L+_KCN OH Obl CTY{eHT. 
_KNC L +_K3 6 Ox Obi CTY{CHTOM. 
_KNI1 Prep+_K Pr Nobj Yai — c caxapoM. 
_KN Pred ['1a3a — HaBbIkKare. 
4 _K Nom Obj Iloqapox — cprny. 
_KNC_L+ KCP+ K Pr Nobj | JJom Opurc sudtom. 
_KNC_L+_KC Pr ['a3a ObUIM HaBbIKatTe. 
5 _KS KypuTp Bociipellasioc. 
_KCI Nom+_K3 6 Jlo3BoHUTHA OBLIO TpoOemon. 
6 _KCI Nom +_ KCN Jlo3BoHUTBCA OBLIAa TpoOsema. 
_K6 Jlo3BOHUTBCA — poOsrema. 
7 _KCI Nom +_KCAdj IIpomosatp OBLIO pa3syMHO. 
_K7 I[pomosrmatb — pa3yMHO. 
_KI Pred +_K Pr Nobj Mosmatb — B ero paBsiax. 
_KI_ NomObj B Mara3HH UJITH CbIHYy. 
8 _KI Pred Movryatb HeKCTAaTH. 
KCI Nom + ~KCP ~ + Mosrmarts Obu0 B ero TpaBusax. 
_K Pr Nobj 
_KCI_ Nom+_KC Pr Movryatb ObvIO HeKCTaTH. 
KCI Nom+ KCI Orka3aTbca OBLIO OOHTeETh. 
9 — —_ —_ 
_K9 Orka3aTpca — OOHTeTB. 
D _KCP+_K Pr Nobj Ot Hero OBI B BOCTOpre. 
_KC Pr C Hum OBUIM 3alIpocTo. 
13 _KCN ByeT 0% Ib. 
14 _Kl4 Bio TeEMHO. 
IS _KI5 Pe3yJIbTaTOM ObLIM JOBOJIBHBI. 
_K Pr Nobj Be3 ocayiKos. 
7 _KCP+_K Pr Nobj byyet 6e3 ocaKos. 
_KC Pr Bsvi0 103HO. 
_Kl7 [[Betos Opto! 


Symbol * (see tab. 3, 5) is used in notation of rules in order to specify that case of 
depend word y is hang on the preposition x. 
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Table 3 — Notation of simple rules that define syntactical connections ¢ € Tm 


Notation of rule | Part of speech of principal (x) and Dash Sequence order 
dependent (y) words of words 
x | Verb (infinitive) 
ae y_| Verb (non infinitive) ae 
x | Verb (infinitive) 
+ 
a7 y | Verb (infinitive) on 
x | Verb (infinitive) : 
_KI Pred scl avers direct 
KI _ Prep eV an muiuve) - direct 
Paes y_| Preposition 
x | Copula 
_KC Pr pol hdverb any 
Noun 
+ 
_KN Pred ne ee any 
x | Copula ; 
ae y | Verb (infinitive) epee 
x | Copula ao) 
_KCI_ Nom is Merb GREnIiVe) indirect 
KCP ae Copula direct 
= y__| Preposition 
_K Pr_Nobj * wrepesnien direct 


The table 4 contains more complex rules that define relations 7(x, y,t) between the 


principal sentence parts (tf ¢ 7m). These rules also use morphological information of 
principal and dependent words, sequence order of principal and dependent word in the 
sentence, acceptable parts of speech of words standing between principal word and 
dependent one (separator), is a dash must be placed between the of principal and dependent 
words. We apply following reference designations for grammatical categories of 
morphological information and their values: 

— Number (1,, N,) takes the values: ‘singular’ (‘sing.’) and plural (‘pl.’); 

— Case (Cy, C,) takes the values: ‘nominative’ (“nom.”), ‘genitive’ (‘gen.’), ‘dative’ 
(dat.), ‘accusative’ (‘acc.’), ‘instrumental’ (‘in.”), ‘locative’ (‘loc.’); 

— Tense (7;, T,) takes the values: ‘past’, ‘present’ (“pres.’), ‘future’ (‘fut’); 

— Gender (Gx, Gy); 

— Person (fF, F,) takes the values: ‘first’(1), ‘second’ (2), “third’(3); 

— Fofm of adjective (AdjF, AdjF,) takes the values: ‘the positive degree’, ‘the 
comparative degree’ (‘comp.’), ‘the superlative degree’, ‘short form’ (‘short’). 

These rules also include: logical operators “AND’ (&), ‘OR’ (v), “NOT’(!); wOrder 
— the sequence order of words (‘direct’, ‘indirect’ ). 

Let us consider the rules that belong to the set Ta . They identify relations with the 
minor sentence parts: categorial agreement, government, joining (see tab. 5, 6). 
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Table 4 — Notation of complex rules that define syntactical connections ¢ € 7m 


Notation of part oF Specchot Sepa- Rule 
es principal (x) and ee 
dependent (y) words 
x| Verb (non infinitive) N=N, &C\= ‘nom.’& ((7,="past’v 
Any F=3) & (N.=‘sing.’ &G,=G, v N= 
y| Noun ‘pl )v T=‘fut.’v T=‘pres.’)) 
Kl x| Verb (non infinitive) N=N&C=nom.’ &(Tx=“past’v F,=3) & 
= Any (N,=‘sing.’& G,=Gyv N=‘pl) v =F, & 
y| Personal pronoun (T.=‘fut.’vT=‘pres.’)) 
x | Verb (non infinitive) ae ; 
y| Cardinal numeral any ees 
x| Verb (infinitive) (C\= ‘nom.’v AdjF\=‘short’ v 
a y| Adjective asa AdjF=‘comp.’) & Dash 
. |x| Verb (infinitive) ee ; 
_KI_ NomObj sil aNeunv Adjective Any C= ‘nom. 
_K_NomObj sil aes !Verb C\= ‘nom.’C\= ‘nom.’ 
y| Noun 
x | Copula ' N,=N, & Ny=‘sing.’ & Fy =3& 
Seu y | Adjective ee AdjF,=‘short’ 
x | Copula N,=N, & N=‘pl” & F, =3 & 
ee y | Adjective yey AdjF=‘short’ 
x | Copula ((C\=*nom.’vC\="in.’) &(7,=‘past’ & 
_KCAdj ae !Verb G,=Gyv T.=‘fut.’vT.="‘pres.’)) v 
y| Adjective AdjF,=‘short’ v AdjF,=‘comp.’ 
x | Copula & ae 
K3_6 A Noun Any NAN, & Cy= ‘in. 
x| Copula wOrder=‘direct’ &N,=Ny&Cy=‘nom.’ & 
KCN !Verb ((T.=‘past’ VF=3)&(N,=‘sing.’ &G,=G, 
y| Noun VN.= ‘pl’) v T=‘fut.v T=‘pres.’)) 
x| Copula wOrder=‘indirect’&N,=N,&C\=‘nom.’ 
_KNC _L !Verb & (T= past VF. =3)&(N,=‘ sing.’ & 
y| Noun G=GVN,= ‘pl vT = fut.’ v7,=‘pres.’)) 
_KC Pr seeend !Verb C\=‘gen.’ v C\="dat.’ v C\=“acc.’ 
y| Noun : : 
x | Noun C=‘nom.’ &(N,=N,&G,=Gy& 
_K2 caer !Verb (C.=‘nom.’ v 
y | eaecave AdjF=‘short’)vAdjF,=‘comp.’) 
K3 a NOUN !Verb C,=‘nom.’ & C\j=‘nom.’ & Dash 
na y| Noun : 
KNI1 Prep ab] NOM = any C=‘nom.’ & Dash 
= = y | Preposition 
K6 i Mero GOsuye) !Verb | C,=‘nom.’ &Dash 
7 y| Noun 
KI7 a eepule Verb | C,#*nom.’ 
a y| Noun 
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Table 5 — Notation of simple rules that define syntactical connections t € Ta 


Notation | Part of speech of principal (x) Sass Sequence order 
of rule and dependent (vy) words P of words 
x | Verb 
t A A 
. y__ | Adverb ny a 
x | Verb 
t : = !Verb An 
a y __ | Adverbial participle : 
x | Verb 
t as !Verb A 
i y | Verb (infinive) _ oo 
tad 2 Verb oe !Verb Any 
y Preposition 
x Noun Seshats bast 
t ae A 
a5 y _ | Preposition Adjective v Participle ny 
% Noun : ' 
t aot i ici Direct 
a6 yy Verb Gniitive) Adjective v Participle irec 
x | Adjective 
t N A 
o y | Adverb pee ny 
tag a ag) echNe None Direct 
y Preposition 
tad ceMee sa — Adverb Direct 
y Verb (infinive) 
taio* * peee Adjective v Participle | Direct 
y Noun 
x | Adverb . 
t None Direct 
au y | Adverb 
tal2 ad AGED : None Direct 
y Preposition 
Table 6 — Notation of complex rules that define syntactical connections ¢ € 7m 
: Part of speech of Sequence 
Notation ae Rule 
peile principal (x) and Separators order of 
dependent (y) words words 
ta13 ie veD Any Any C\# ‘nom.’ 
y | Noun : 
x | Noun (C=C VAdjF,=‘ comp.’ )& 
ear A A sarees 7 
fal y | Adjective exery N=N&(N=‘pl’ VvG|=G2) 
x | Noun Adjective v . ‘ , 
Cals y | Noun Particinle Direct C\# ‘nom. 
x | Adjective Adjective v . nats éanys are 
tal6 yilNoun Participle Direct Cy=‘gen.’ v C,=“‘dat.’ v C=‘in. 


The rules presented in the tables 3-6 define the set of threes (x, y,t) for which 
n(x,y,t)=1. It is possible to express all types of syntactic connections using threes 
(x, y,¢). A syntactic connection between x and y that is achieved by connective word z 
(a prepositional government) we express using two threes: (x,2z,f,), (Z,y.t5). 
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We search for pairs of potentially connected wordforms in the sentence using these 
rules. Let us save founded pairs of potentially connected wordforms in the sentence S (2)- 


(3) asasetR of threes (x, y,f): 
R=((x, yO} 9(% 0) = 1, 
XESS, YES, 1F J 


(4) 


Reduction a quantity of pairs of potentially connected wordforms. 

The set of first components of this threes set will be marked as A (the set of 
principal words), the set of second components the threes set will be marked as B (the set 
of dependent words). 

A={x}:A(xyteR 


B={y}:dayoneR 2) 


We can build a dependence tree for the sentence if all it’s wordforms are connected 
with one or more wordforms by syntactic connection. 

Let us introduce the criterion of sentence’s connectedness: “At least one interpretation of 
each wordform must belong to the set of principal words or to the set of dependent words.” 


Vi=1,N Azes,:z€(AVUB) (6) 

The sentence, which doesn’t satisfy criterion (6), is not syntactically connected. It is 
possible that the sentence is written with error. The analysis of such sentences stops. 

Let’s form vector S" that describe sentence S . Each element Ss; of vector S’ is a 
subset of S;. Members of S ; should belong to the set of principal words A or to the set of 
dependent words B. 

SACS coSesuetg )s 
7. 
s,/cs,:Vzes, z€(AUB). co 

Thus we reduce a quantity of wordforms interpretations due to using as sentence’s 


representation the vector S ’ instead of vector S.. 


Using a sentence’s representation S ’ we build a set of morphological sentence’s 
markings. The D set of morphological sentence’s markings can be received as the 


Cartesian product of sets which are the elements of a vector S - 
DSS Ke hk os 

(8) 

Ded ad, 2(0 ssa) 


Most of morphological sentence’s markings d, are invalid. We reject such 
morphological sentence’s markings using outlined below criterions. 

Let’s create following sets in order to apply the criterion of sentence’s connectedness 
to the morphological sentence’s marking ad: F;, — a set of components of morphological 
sentence’s marking d;; R; — a subset of set R (4) which contain an information about pairs 


' a k 
of potentially connected by syntactic relationship interpretations of wordforms d ; and 


d ; A, —a Set of the set of principal words of these pairs; B, — a set of dependent words. 
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F, ={d/},i=1,N 

R, ={.y.D:@yOeR xe Fk, ye FF} 
A, ={x:(x, yt) € R,} 

B, ={y (aye Ry} 


The following condition allows checking if the morphological sentence’s marking a, 
satisfies the criterion of sentence’s connectedness. 


d' «(A, UB,),i=1,N (10) 


(9) 


The morphological sentence’s marking d; which does not satisfy criterion (10) is 
unacceptable. 

For the sentences considered in the paper it is possible to build a dependency tree if 
the morphological sentence’s marking d satisfy following criterion: “A count of 
wordforms which belong to the set of principal words but not belong to the set of 
dependent words should not be more then 1.” 


The next criterion deals with a prepositional government. Let P is the set of 
prepositions of Russian language. The criterion is following: “Prepositions belong to both 
the set of principal words of the sentence and the set of dependent words of the sentence.” 


VzEPOF, ze B, OA, (12) 


We will continue further analysis of morphological sentence’s marking d, if it 
satisfies the criteria (10)- (12). 


Rm, ={(x%, y,0:(%, yD €R,,t=t,,t, eh} 


Building of dependency tree. 

The pair (Fy, Rx) describes the directed edge-labeled graph G;. The set F), is a nodes 
set of this digraph. The set Rk is a set of labeled edges (x,y,t). Here pair (x,y) is an arc 
from x to y and t is a label of the edge. Required dependence tree is a subgraph of digraph 
Gy. 

But not all of them are the trees of syntax subordination (TSS). The decision on the 
reasonableness of morphological sentence marking and admissibility of separate connections 
from the R; set will be made in terms of the next criteria. 

Digraphs simple connectedness which is designated by F;, and R; connections subsets 
not contradicting the minimal structure scheme templates. 

Equality of 1 demidegree of these digraphs peaks stopping. The Rk correspondence 
to the h minimal structure scheme template is analyzed. For this the Rm={Rmi} set will be 
put, where RmieRk is one type and this type is included to the h template 


Rm, = {((x, y,t): (x, y,t) € R,,t =t,,t, € h} (12) 

If |Rm|<|h| the sentence doesn’t correspond to h. 

Let we put RM={rmy}, where Cc Rm|x...xRmjx...xRm] and rmy=((x1,V141), 
we (XLYLED): if 1 >1 x1 =x2,Vi> 1 xj4+1=y). 

The rmy element is a base for the TSS creation by the / template. Let ={(@%),)}. 
where (x,y,/) is the elements of rmy vector. It is necessary to add the minor connections of 
the c set. 

CHK) OL Yt) € (Ry OTa) HA y thegi(Vv=yvx=x)} (13) 
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Let’s mark &'=2UC. If the digraph (g’, Rk) is not single-connected, so it’s 
impossible to create the correct TSS. Otherwise it’s necessary to solve a peaks problem 
with an in-degree more than 1. For each such peak only one connection is left, according to 
the requirement, that the way length from the root vertex to it is maximum. If there is one 
peak in which the n of competitive connections is brought on the ways of identical length, 
it is considered that there is a syntax homonymy and all the n connections are correct and n 
different TSS corresponds to the pair (Fk, g’). 

The list of syntactically connected words pairs is a recognized correct g’ connections 


combining, which are built on all Rmj for every F'k and h template. 
Experimental system of sentence parsing. 


As a result of developed algorithm programs implementation is created the experimental 
system of sentence parsing. Input system’s data is the text, which consists of Russian words, 


the sentences are ended with punctuation marks ( 


66 99 


yh, 2, ...), all the sentence’s words 


are in the lower register (except the first word). The text, which is input from file, should be in 


the Windows-1251 coding. See examples of system responses in tab. 7. 


Table 7 — Examples of responses 


Sentence : ba a. ; JCI 
cues MoI Kl = CHAeKIM 
MpbI cue Ha 7 Mbt (_K1} 
cues Ha Upr 
BOCbMOM =| Ha (Upr) 
9TaKe a pia Upr =) sTaxe (Upr) 
9TarKe BOCbBMOM Sogl BoCcbMOM (Sogl) 
ViropceKaa HOUb VitompcKkaa Sogl =) 6bina CKNCL) 
a =) Hob {_KNC_| 
HOUb Oba Opuia HOU _KNC_L Vecnecxantsou) 
THxaA Oba THXaa _KCAdj THxaa {_KCAdj) 
Tapeub Obit OBI Ilapens _KNC_L = ce esti 
japeHb (_KNC_| 
CILOPTCMCHOM OBI clOpTcMeHOM | _ K3 6 cnopTcmeHom (_K3_6) 
Toe 6 OBUI Jlom KNC L =| Bin ; 
om OyzeT 6e3 a 7 fom (_KNC_L 
nara OBI 0e3 _KCP =) Bes ( KCP) 
0e3 muta _K Pr Nobj naa (_K_Pr_Nobj) 
Kyputp =| BOCNpewanoce 
Boclipemasocs | Kyputp K 
BOCIIPeMlasIOCh oc yP _ 5 KypHTb (_K5) 
JlossonuTeca =| JIo3sonntsca | mpoOnemoH | KCI Nom =) Bbino 
OBLIO AosboHHTeca {_KCI_Nom) 
Tpo6r1emoii ObLIO Jlo3BonHutpca | KC Pr npofinemori (_K3_6) 
[[pomosmsatp OBLIO IIpomosuatp | KCI Nom =) Gino 
OBLIO Mpomonyatb (_KCI_Nom) 
pa3yMHee ObuIO pasyMHee _KCAdj pasymnee (_KCAd) 
Yerynurs OBLIO YctTynutTb _KCI Nom =| 6bino 
Get 6BUIO B KCP ¥cTynMTb {_KCI_Nom) 
= =| (_KCP) 
mpaBuslax B lipaBwsiax _K Pr Nobj npasnnax (_K_Pr_Nobj) 
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RESUME 
G.V. Dorokhina 


The Algorithm of Syntactic Analysis 


Based on Grammatical Rules 

The automatic syntactic analysis of Russian texts is usually solved using the 
statistical methods. But the solving of particular tasks of text analysis only by statistical 
methods says few new for the fundamental linguistics. Nowadays tools for automatic text 
analysis that based on linguistic methods are not developed enough. This fact points to the 
importance of the research. The object of research is a simple sentence of Russian 
language that is expanded but not semi-composite. The subject of research is the method of 
building a dependency tree by the sentence. The goal of research is to create the syntactic 
analysis algorithm based on grammatical rules. 

The article propose the The proposed algorithm of Sentence Processing consists of 
following stages. 

1. The morphological analysis of wordfroms. 

2. Searching for pairs of potentially connected wordforms in the sentence. 

3. Reduction a quantity of pairs of potentially connected wordforms. 

4. Building of dependency tree. 

We search for pairs of potentially connected wordforms in the sentence using 
grammatical rules. All types of syntactic connections are expressed as using connections 
between pairs of words. The paper contains the description of rules that allow defining if 
two words are potentially connected. The connections between words forming predictive 
base of sentence are considered in detail. Such connections are called the main 
connections. The rest of connections we call the minor connections. 

Each wordform in Russian may have some interpretation on morphological level due 
to the phenomenon of homonymy. The way to reduce a quantity of pairs of potentially 
connected wordforms is proposed. It allows reducing the computational complexity of the 
algorithm. 

We build the dependency tree as following. For the first we choose the sets of main 
connections needed to form the predictive base of sentence. Then for each of this set we 
build dependency trees by adding the minor connections. 


Cmamba nocmynuia 6 pedaxyuro 05.06.2014. 
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